docs: add HIP/AMD NaN warning for q8_0/turbo3 on large K-norm models by brosequist · Pull Request #66 · TheTom/turboquant_plus

brosequist · 2026-04-01T20:24:44Z

Summary

Adds a prominent WARNING block to docs/turboquant-recommendations.md documenting observed NaN divergence when using q8_0 or turbo3 on models with large K-vector norms (e.g. Qwen2.5-7B) on AMD/ROCm (HIP) backends.
Includes recommended mitigations: switch to turbo2/turbo4, or add pre-quantization K-norm clipping.

Test plan

Docs-only change, no code to test.

🤖 Generated with Claude Code

Adds a prominent WARNING block to turboquant-recommendations.md documenting the observed NaN divergence when using q8_0 or turbo3 compression on models with large K-vector norms (e.g. Qwen2.5-7B) on AMD/ROCm (HIP) backends. The root cause is the int8 overflow path that differs between HIP and CUDA. Recommended mitigations: switch to turbo2/turbo4 or add pre-quantization K-norm clipping. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add HIP/AMD NaN warning for q8_0/turbo3 on large K-norm models#66

docs: add HIP/AMD NaN warning for q8_0/turbo3 on large K-norm models#66
brosequist wants to merge 1 commit intoTheTom:mainfrom
brosequist:docs/hip-nan-warning

brosequist commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

brosequist commented Apr 1, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant